69 research outputs found

    Kangaroo – A pattern-matching program for biological sequences

    Get PDF
    BACKGROUND: Biologists are often interested in performing a simple database search to identify proteins or genes that contain a well-defined sequence pattern. Many databases do not provide straightforward or readily available query tools to perform simple searches, such as identifying transcription binding sites, protein motifs, or repetitive DNA sequences. However, in many cases simple pattern-matching searches can reveal a wealth of information. We present in this paper a regular expression pattern-matching tool that was used to identify short repetitive DNA sequences in human coding regions for the purpose of identifying potential mutation sites in mismatch repair deficient cells. RESULTS: Kangaroo is a web-based regular expression pattern-matching program that can search for patterns in DNA, protein, or coding region sequences in ten different organisms. The program is implemented to facilitate a wide range of queries with no restriction on the length or complexity of the query expression. The program is accessible on the web at http://bioinfo.mshri.on.ca/kangaroo/ and the source code is freely distributed at http://sourceforge.net/projects/slritools/. CONCLUSION: A low-level simple pattern-matching application can prove to be a useful tool in many research settings. For example, Kangaroo was used to identify potential genetic targets in a human colorectal cancer variant that is characterized by a high frequency of mutations in coding regions containing mononucleotide repeats

    Comprehensive modeling of microRNA targets predicts functional non-conserved and non-canonical sites

    Get PDF
    mirSVR is a new machine learning method for ranking microRNA target sites by a down-regulation score. The algorithm trains a regression model on sequence and contextual features extracted from miRanda-predicted target sites. In a large-scale evaluation, miRanda-mirSVR is competitive with other target prediction methods in identifying target genes and predicting the extent of their downregulation at the mRNA or protein levels. Importantly, the method identifies a significant number of experimentally determined non-canonical and non-conserved sites

    Computational Analysis of Mouse piRNA Sequence and Biogenesis

    Get PDF
    The recent discovery of a new class of 30-nucleotide long RNAs in mammalian testes, called PIWI-interacting RNA (piRNA), with similarities to microRNAs and repeat-associated small interfering RNAs (rasiRNAs), has raised puzzling questions regarding their biogenesis and function. We report a comparative analysis of currently available piRNA sequence data from the pachytene stage of mouse spermatogenesis that sheds light on their sequence diversity and mechanism of biogenesis. We conclude that (i) there are at least four times as many piRNAs in mouse testes than currently known; (ii) piRNAs, which originate from long precursor transcripts, are generated by quasi-random enzymatic processing that is guided by a weak sequence signature at the piRNA 5β€²ends resulting in a large number of distinct sequences; and (iii) many of the piRNA clusters contain inverted repeats segments capable of forming double-strand RNA fold-back segments that may initiate piRNA processing analogous to transposon silencing

    A Comparison of Homogenization vs. Enzymatic Lysis for Microbiome Profiling in Clinical Endoscopic Biopsy Tissue Samples

    Get PDF
    Identification of the human microbiome has proven to be of utmost importance with the emerging role of bacteria in various physiological and pathological processes. High throughput sequencing strategies have evolved to assess the composition of the microbiome. To identify possible bias that may exist in the processing of tissue for whole genome sequencing (WGS), it is important to evaluate the extraction method on the overall microbial content and composition. Here we compare two different methods of extraction, homogenization vs. enzymatic lysis, on gastric, esophageal and colorectal biopsies and survey the microbial content and composition using WGS and quantitative PCR (qPCR). We examined total bacterial content using universal 16S rDNA qPCR as well as the abundance of three phyla (Actinobacter, Firmicutes, Bacteroidetes) and one genus (Fusobacterium). We found minimal differences between the two extraction methods in the overall community structure. Furthermore, based on our qPCR analysis, neither method demonstrated preferential extraction of any particular clade of bacteria, nor significantly altered the detection of Gram-positive or Gram-negative organisms. However, although the overall microbial composition remained very similar and the most prevalent bacteria could be detected effectively using either method, the precise community structure and microbial abundances between the two methods were different, primarily due to variations in detection of low abundance genus. We also demonstrate that the homogenization extraction method provides higher microbial DNA content and higher read counts from human tissue biopsy samples of the gastrointestinal tract

    SeqHound: biological sequence and structure database as a platform for bioinformatics research

    Get PDF
    BACKGROUND: SeqHound has been developed as an integrated biological sequence, taxonomy, annotation and 3-D structure database system. It provides a high-performance server platform for bioinformatics research in a locally-hosted environment. RESULTS: SeqHound is based on the National Center for Biotechnology Information data model and programming tools. It offers daily updated contents of all Entrez sequence databases in addition to 3-D structural data and information about sequence redundancies, sequence neighbours, taxonomy, complete genomes, functional annotation including Gene Ontology terms and literature links to PubMed. SeqHound is accessible via a web server through a Perl, C or C++ remote API or an optimized local API. It provides functionality necessary to retrieve specialized subsets of sequences, structures and structural domains. Sequences may be retrieved in FASTA, GenBank, ASN.1 and XML formats. Structures are available in ASN.1, XML and PDB formats. Emphasis has been placed on complete genomes, taxonomy, domain and functional annotation as well as 3-D structural functionality in the API, while fielded text indexing functionality remains under development. SeqHound also offers a streamlined WWW interface for simple web-user queries. CONCLUSIONS: The system has proven useful in several published bioinformatics projects such as the BIND database and offers a cost-effective infrastructure for research. SeqHound will continue to develop and be provided as a service of the Blueprint Initiative at the Samuel Lunenfeld Research Institute. The source code and examples are available under the terms of the GNU public license at the Sourceforge site http://sourceforge.net/projects/slritools/ in the SLRI Toolkit

    Structure-Templated Predictions of Novel Protein Interactions from Sequence Information

    Get PDF
    The multitude of functions performed in the cell are largely controlled by a set of carefully orchestrated protein interactions often facilitated by specific binding of conserved domains in the interacting proteins. Interacting domains commonly exhibit distinct binding specificity to short and conserved recognition peptides called binding profiles. Although many conserved domains are known in nature, only a few have well-characterized binding profiles. Here, we describe a novel predictive method known as domain–motif interactions from structural topology (D-MIST) for elucidating the binding profiles of interacting domains. A set of domains and their corresponding binding profiles were derived from extant protein structures and protein interaction data and then used to predict novel protein interactions in yeast. A number of the predicted interactions were verified experimentally, including new interactions of the mitotic exit network, RNA polymerases, nucleotide metabolism enzymes, and the chaperone complex. These results demonstrate that new protein interactions can be predicted exclusively from sequence information

    Comprehensive evaluation of differential gene expression analysis methods for RNA-seq data

    Get PDF
    A large number of computational methods have been developed for analyzing differential gene expression in RNA-seq data. We describe a comprehensive evaluation of common methods using the SEQC benchmark dataset and ENCODE data. We consider a number of key features, including normalization, accuracy of differential expression detection and differential expression analysis when one condition has no detectable expression. We find significant differences among the methods, but note that array-based methods adapted to RNA-seq data perform comparably to methods designed for RNA-seq. Our results demonstrate that increasing the number of replicate samples significantly improves detection power over increased sequencing depth

    Neurophysiological Defects and Neuronal Gene Deregulation in Drosophila mir-124 Mutants

    Get PDF
    miR-124 is conserved in sequence and neuronal expression across the animal kingdom and is predicted to have hundreds of mRNA targets. Diverse defects in neural development and function were reported from miR-124 antisense studies in vertebrates, but a nematode knockout of mir-124 surprisingly lacked detectable phenotypes. To provide genetic insight from Drosophila, we deleted its single mir-124 locus and found that it is dispensable for gross aspects of neural specification and differentiation. On the other hand, we detected a variety of mutant phenotypes that were rescuable by a mir-124 genomic transgene, including short lifespan, increased dendrite variation, impaired larval locomotion, and aberrant synaptic release at the NMJ. These phenotypes reflect extensive requirements of miR-124 even under optimal culture conditions. Comparison of the transcriptomes of cells from wild-type and mir-124 mutant animals, purified on the basis of mir-124 promoter activity, revealed broad upregulation of direct miR-124 targets. However, in contrast to the proposed mutual exclusion model for miR-124 function, its functional targets were relatively highly expressed in miR-124–expressing cells and were not enriched in genes annotated with epidermal expression. A notable aspect of the direct miR-124 network was coordinate targeting of five positive components in the retrograde BMP signaling pathway, whose activation in neurons increases synaptic release at the NMJ, similar to mir-124 mutants. Derepression of the direct miR-124 target network also had many secondary effects, including over-activity of other post-transcriptional repressors and a net incomplete transition from a neuroblast to a neuronal gene expression signature. Altogether, these studies demonstrate complex consequences of miR-124 loss on neural gene expression and neurophysiology
    • …
    corecore